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We consider the -problem of the transmission of analog data from a 
Gaussian source over a memoryless channel with capacity C nats per 
second. The source emits R independent zero mean Gaussian variates 
per second with variance a 2 . These digits are block-coded RN at a time 
into N second channel inputs. The performance criterion is the mean 
square error. Let e 2 (N) be the smallest attainable mean square error 
with parameter N(R, C, a 2 fixed). Shannon has shown that e 2 (N) ^ 
a 2 exp (-2C/R) = Co an d (2 (N) — > cj; as iV — » <x>. Hence the ideal error 
to is attainable in the limit as the coding delay N — * °° . We are concerned 
With the rate at which e 2 (N) — ► e* , and our principal result is that e 2 (N) — 
it ^ 0[(log N/N)*]. 

I. INTRODUCTION 

We are interested in the following problem. Suppose we have an 
analog data source which emits a sequence of statistically independent 
Gaussian variates at a rate of R per second. We wish to transmit this 
data through a noisy channel of capacity C nats per second. Our prob- 
lem is the determination of the minimum possible mean-squared-error. 

Specifically we shall study the communication system of Figure 1. 
The output of the analog source is a sequence X, , X 2 , X 3 • • • of 
statistically independent Gaussian variates with zero mean and variance 
a 2 which appear at the coder input at rate of R per second. After N 
seconds, n = NR source variates have accumulated at the coder input. 
Let X denote this random rc-vector. The channel is a discrete memoryless 
channel* which we assume accepts one input per second, and the coder 
contains a mapping of X to an allowable channel input iV-vector S. 
Since it requires N seconds to transmit S, the system can process the 
data continuously without a "backup" at the coder input. 

* Actually our results are valid for a broader class of channels. See the remark 
after Theorem 2 in Section II. 
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The decoder examines the channel output JV-vector R and emits a 
Euclidean ??.-vector X' which is hopefully "close" to X. The error 
criterion which we adopt is (the "mean-squared-error") 



1 



e 2 = -E II X - X' 
n 



(1) 



where "|| ||" is the Euclidean norm and E denotes expectation. 

We shall assume that the parameters a 2 , R, and C are held fixed 
for this entire paper, and denote by e 2 (N), the smallest value of £ 
attainable with parameter N (and therefore n = RN). Shannon 1 has 
shown that 
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Fig. 1 — Communication System. 

so that e = Co is attainable in the limit as the delay N — »" » . We are 
concerned here with the rate at which e 2 (iV) approaches the ideal e 2 , 
and our principal result is 



^e 2 [] 



t(N) ^ 6 1 + 
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log* + (l) 
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as N —* oo 



(3) 



where /3 > 0, a parameter related to the channel, is defined in equation 

(13). 

This result is related to a result of D. Sakrison 2 which was done 
independently.* In fact we have used one of his ideas (Lemma 1 in 
our paper) to simplify our original proof. 



II. STATEMENT AND DISCUSSION OF RESULTS 

Following Shannon's technique, 1 we separate the coder into two parts 
as shown in Figure 2. The first part, called the source encoder or quantizer, 
contains a fixed set S of M Euclidean n-vectors, and associates with 
each possible input n-vector X a member of S (say X). Let us denote 

*This paper and Sakrison's paper were presented at the International Sym- 
posium on Information Theory, San Remo. Italy, September 1967. 
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Fig. 2 — Decomposition of the Coder. 



the resulting (mean-square) "quantization error" by 

4 = -K \\X - X || 2 . 



(4) 



The second part of the encoder, called the channel encoder, associates 
with each X (one of the M members of S) a channel input iV-vector 
(say S). The decoder, at the receiving end of the channel, associates 
with each received N- vector R one of the members of S (say X')- Let 
P e = Pr {X' 5* Xj be the probability of a transmission error, and 
denote the (mean-squared) transmission error by 



, r =-E 



X - X' 



Clearly 



e'r ^ - P r max 1 1 U — V 1 1". 



Further, the overall error t satisfies 



1 



e 2 = - E 

n 



X - X' |! 2 ^ («« + e r f. 



(5) 



(6) 



(7) 



Thus we want to make both e% and € 



as small as possible. 

Consider the parameter M , the number of members of the approximat- 
ing set S. In the interest of minimizing €q we want to make M large. 
However, in the interest of minimizing P e and therefore e£ , we want 
to make M small. The proper compromise yields our result, equation (3). 

The following theorems indicate just how to choose M. The first is 
proved in Section III, and the second was proved by C. E. Shannon 
(Ref. 3, p. 16). 

Theorem 1: (Source Encoding). Let Xbe a random n-vector (it = 1, 2, • • •) 
ivhose components are zero mean Gaussian variates with variance a 2 . 
Lei R > be given and let {a„|* be a sequence which tends to zero. Then 
there exists a set S of I\f n-vectors, where 



M ^ exp [n(R - a.)], 



(8) 
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and a mapping f of X into §>, such that as n — > <*> 

-E\\X- /(X) || 2 
n 

g *-.(, + 2a . + ly) + («4) + o(****z) ■ o) 

Furthermore, S is swcfr Mai /or a/£ u c S, 

-||u|| 2 ^a 2 . (10) 

A special case of some interest in itself is that for which a„ = 0. 
In this case Theorem 1 asserts the existence of a quantization of X 
with M = exp (nR ) points and mean-squared quantization error no 
more than a 2 exp (-2B )(1 + log n/n) + O(log log n/n) asn-^co. 
If the channel is noiseless with capacity C nats second, it can transmit 
e CN messages with no error in the N seconds that it takes for the n- 
vector X to be emitted from the source. Thus if R = C/R, M = exp 
(R n) = e CN , and the members of S can be reconstructed perfectly by 
the receiver. The overall error is therefore 

«' - 6 J S ,' exp (-2C/8)(l + ***) + (to-») . (11) 

Let us turn now to the discrete memoryless channel denned by an 
input set (1, 2, ■ ■ • , K), an output set (1, 2, • ■ • , J), and a set of transi- 
tion probabilities P(j | fc), 1 ^ j £ J, 1 £ & ^ K. Corresponding to 
each input probability distribution p k , 1 ^ k g K, there is a joint 
distribution p(k, j) = P(j \ k)p k on the product of the input and output 
sets. Define the random variable (called "information") 

{/(fc, j) = log j -S(U3— 1 , l^k^K, l£j£J. (12) 

1Ep*.pGM*')] 

The channel capacity C = max lpil #£/, where the maximization is 
performed with respect to all possible input probability distributions. 
Let (pt)f-i be a maximizing input distribution, and let U*(k, j) be 
the corresponding information, then define 

IS = (2 var U*)~ l = (2EU* 2 - 2C 2 )~\ (13) 

Theorem 2: (Shannon): Let {M"-i be a sequence which tends to zero 
from above. Then there exists an N-dimensional code (for the channel 
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described above) with M members 

M ^ exp [N(C - ML (14) 

and (for any a priori distribution on the code words) error probability 

1\ ^ k exp (-WbZ), (15) 

where k is independent of N, and /3 is defined by (13). 

Actually we can broaden the class of channels for which our main 
result (3) holds to include that class of channels for which Theorem 2 
holds for some constant /3. This broadened class includes the Gaussian 
channel with signal-to-noise ratio p for which /3 = [(1 + p) 2 /p(2 + p)] 
(see equation 74 of Ref . 4) . 

Theorems 1 and 2 lead us directly to the proper choice of M and 
our main result. Since the channel encoder must encode each of the 
members of S into channel inputs, we equate the M's of Theorems 1 
and 2 obtaining (from n = NR) 

R = C/R and b„ = Ra n . (16) 

If we then choose 



where /3 is defined in (13) we have from Theorem 1 a quantization error 

^^ X p(- 2 c/«)( 1+ ^V¥) + ( 15 ^). M 

and from (6), (10), and Theorem 2, a transmission error 

4 ^ 4a-'P, ^ 4<r 2 A:^- (19) 

Thus by combining (7), (IS), and (19) we have an over-all mean-squared 
error 

, s , exp( _ 2C 4 + ^^) +0 (_y. (20) 

This is our result (3) . 

III. PROOF OF THEOREM 1 

We must establish the existence of a mapping / of Euclidean n-space 
into a set S of M n-vectors such that E \\ X — /(X) \\*/n satisfies the 
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upper bound (9). Let r„ = E \\X\\. It can be shown that with very 
high probability, as n — » oo , X will be near the surface of the sphere 
of radius r„ with center at the origin. It turns out to be convenient 
to establish the existence of a mapping g from the surface S re of this 
sphere to a set S. 

Accordingly we shall construct the mapping / as follows. Let X p = 
X(E || X 1 1/| | X ||) = X(r /|| X ||) be the projection of X onto the surface 
S ro . Let g be a mapping of S ra to some set S of w-vectors (g has not 
yet been found, of course), then / - g(KJ. The following lemma of 
D. J. Sakrisan 2 , is proved in the Appendix. 

Lemma 1: For any mapping g, and f = g(X p ), 

E\\X- /(X) H 2 = Var || X ||+ E || X, - g(X p ) || 2 . (21) 

Since, as we shall see, Var || X || is relatively small, the principal 
contribution to E || X - /(X) || 2 is E || X p - g{X p ) || 2 . 

Our next task is to find the mapping g, and to this end we will establish 
a lemma concerning the covering of S r . by spherical caps. First some 
definitions. 

Let w, z with and without subscripts denote points on *S r „ , the 
surface of a sphere in /(-space of radius r„ . Let a(w, z) be the angle* 
between w and z. For ^ ^ tt, let e(w, 8) = [z : o;(w, z) ^ 6] be 
the spherical cap of half angle centered at w. Assign the usual "area" 
measure to S ro . If A C S ro is measurable, let n(A) be its measure. 
In particular, let 

CM = M [C(w. 8)] = (W -. 1)7r "~*f" ' f sin"" 2 <p dv (22) 

(n + W 



be the area (measure) of a cap of half angle 0f, so that 



n^vr 1 



CM = ^Hf^ (23) 

Jn + 2 



is the area of S r . . We now state 

Lemma 2:\ Let X„ be a random vector which is uniformly distributed 

* The angle o(w, z) is denned by cos a = (w, z)/|| w || || z ||, where (w, z) is 
the inner product and < a < tt. 

t It is shown in Ref. 5 that A n (r), the surface area of an n-sphere of radius r 
is fiiven by A n (r) = nir'" 2 r'^'/F I (?i + 2)/2]. Equation (22) follows from the fact 
thatCn(e) = Jo 9 (r.rfqj) A„.i (r sin y). _ . 

t Lemma 2 is related to a result on the covering of the ?i-sphere in Ket. b. 
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on S ro . Let M (a positive integer) and d (0 ^ 6 ^ tt) be arbitrary. Then 
(for any dimension 71) there exists a set of M points j w, , • • • , w, u j CZ S r „ 
such that 

Q(w, , • • ■ , w.„) 

4 Pr |x„ t U e(w y , 0)} ^ exp [-MC n (d)/C\(ir)]. (24) 

Proof: Let us define the function 

/' (w, , w, , • • • , w,, , z) = > '(25) 

[0 otherwise. 

Then for a fixed set w, , • • • , w.„ , 

Q(w, , ••• ,w„) = Pr|x p ^Ue(w,. , 6), 
= EF(w, , •■■ , w,, , X„) 

- ^ / F(w, , • • • , w. M , z) rf/i(z). (20) 

Now consider a random experiment in which the M points w, , • • • , w.„ 
are random vectors chosen independently with uniform distribution 
on $ r „ . Q is now a random variable given by 

WW, , W, , • • • , W,,) - ~-^ f F(W, , • • • , W„ , z) d M (z), (27) 

where upper case W's represent random vectors. We now compute 
EQCNi , ■ ■ ■ , W,„), the average of Q over all choices of W, , • • • , W,„ . 
We will show that EQ ^ exp [- A IC n (d)/C „(*)]. Since at least one set 
[W] , • • • , w.,/1 must satisfy Q(w x , ■ ■ ■ , w. v ) ^ EQ, the lemma will 
follow. 

We can write 

EQ = E —^ [ F(W, , • • • , W,, , z) d M (z) 

C„(x) J 

= ~"T I ^(z)^(W, , • • • , W u , z), (28) 

C„(7T) J 

the interchange of expectation and integration being justified by the 
fact that F ^ 0. As indicated in (28), EF{W X , • ■ • , W M , z) is computed 



i 
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with z held fixed. Now 

M 

EF(W, , ... , W M , z) = Pr {F = 1} = Pr f| l«(W„ z) > 9) 

= (l - §^)' U ^ exp l-MCM/CJr)], 

(29) 
so that 

tf(Q) £ exp [-HCM/CM] J ^ = exp [-MCM/CM] (30) 

which concludes the proof. 

We now give the construction of g. Let M = M n = exp [n(/2„ — a n )] 
as in Theorem 1, and let = 6 n = arc sin exp ( — /2 + 8 n )> where 
fi„ > will be specified below. Let {w, , w 2 , • • ■ , w M } be a set which 
satisfies (24) for these and M. Let x t S ro and say x e \jf ml e(w, , 0). 
Let j„ be the smallest index j such that x t 6(w, , fl). Then 

<j(x) = (cos 0)w,„ , (31) 

(see Figure 3), and || g(x) - x || ^ r sin 0. If x 4 \Jf ml C(w, , 0), then 
let g-(x) = w x . In this case || x — g(x) || ^ 2r . Hence 

# || X„ - </(X p ) || 2 ^ r" sin 2 + 4rSQ(w, , w 2 , ■ ■ • , w„). (32) 

Since the set {w,} satisfies (24), 

E I |X, - g(X v ) || 2 =g r 2 sin 2 n + 4r 2 exp [-MCM/CMI (33) 



\ \ nlv\ _- I 1 
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Fig. 3 — Construction of g(x). The solid line is the cap C(w,-, 0). 
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Combining (33), and Lemma 1, and the fact (proved in the Appendix) 
that 

r ^ ay/n and - var II X II ^ 0.92(x 2 /n, (34) 

n 

we obtain 

±tf||X-/(X)|| 2 

TV 

^ a 2 sin 2 6 n + 4a 2 exp [-M n C n ($ n )/C n (ir)] + 0.92<r 2 /n, (35) 

where M„ = exp [n(R — a„)] and 6 n = arc sin exp (— R + 5 n ), where 
5„ > is to be specified. Our final step is selection of 8 n so that (9) is 
satisfied. For 6„ bounded away from t/2, Shannon (equation 27 of 
Ref . 4) has shown that as n — * °o , 

C n (g n ) 1 sin" 6 n . 1 . „ . , QR . 

„ , v ~ — ^= 7 — : — r- > — 7= sin 0„ , (36) 

CM -\/2irn cos 0„ sm 6 n y/2im 

so that (using the definitions of M„ and 0„) for 7i sufficiently large 

Mn m± a exp [«0- a£. (37) 

k »W \/27m 

We now define 6 n by 

8n = ttn + 1 J2ft» + MM^ + MjvS. (38) 

2 71 71 7i 

Then, from (37), for ?i sufficiently large 

exp [-M n C n (6 n )/C n (ir)] ^ 1/n. (39) 

Finally, we have 
exp (28 n ) = 1 + 25„ + 0(5 2 ) 

= l+2 5 „ + 0(a 2 ) + o[(^p) 2 ]. (40) 

Combining (35), (39), and (40) we obtain 

^||X-/(X) || 2 

TV 

S A— (l + 2a„ + i«2) + o(i5£Jp) + 0(0 (41) 
which is (9). 
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Inequality (10) follows from (31) and (34) by simply 

|| /(X) || = |i g(X p ) || = r cos B < r n < Vn~? . (42) 



APPENDIX 

In this appendix we establish some facts about the random vector 
X = (Xi , X 2 , • • • , X n ) whose coordinates are independent zero-mean 
Gaussian variates with variance a 2 . 

Proposition : 

(i) E || X || ^ (na 2 ) h 
(ii) var || X || ^ 0.92(r 2 

Proof: (i) follows from the Schwarz inequality: [i?(l-|| X ||)] 2 ^ 
E || X || 2 El" = E || X || 2 = no 2 . To establish (ii), consider the n-fold 
joint probability density for the vector (R, <£,,••• , <p„-0, the polar 
coordinate representation of X*, 

g(r, <pt , fi , >p*-i) = exp (r 2 /2<j 2 )r n ~ l h(ipi , ■ ■ • , ip n -i), (43) 

where h(<p x , • • ■ , <p n -i) = (2ira 2 y n/2 cos "~ 2 ^ cos"" 3 <p 2 ■ • • cos <p„_ 2 . 

The marginal distribution for the random variable R = 1 1 X 1 1 is 

2 



f(r) = 



(2*r 2 r(f) 

so that an integration yields 



r" exp (— r~/2a~), 



E\\x\\=E(R)= V2a 2 r( r 4 Ll )/r(| 



Since E || X || 2 = no 2 , we have 
var 1 1 X 1 1 = no- 
Using the Stirling formula, 

1 1 



1-? 
n 



n + 1 



KI 



(44) 



"V _ '\/27r 1 + 



12. 360.7 * r(M) * «~ V ~ V M X + T2. 



*See, for example, M. G. Kendall and A. Stuart, 77te Advanced Theory of 
Statistics, vol. 1, London: Griffin, 1963, Section 11.2. 
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to underestimate T(n + 1/2) and overestimate T(n/2) we obtain 
Var II X II J x 1\'L _ 7 _ 1 V (46 ) 



na 2 ~ ' \ ' nJ \ 36n(n + 1) 90(w + 1) 

Further 



e V + n) = eXp [ " 1 + n l0g (1 + l/n)] 



^ exp [-1 + n(l/n - l/2n 2 )] 



and 



= exp(-l/2n) £ 1 -^, 



1 +3S A^.^i&+ n -°* 



36n(n + 1) 90(n + 1)" = n* \36 ' 90 

Hence, 

Var II X II ^ - ,. n - - lx/1 n „„ -2n ^ 0.5 0.42 

"■ — LL < 1 — (1 — O.on )(1 — 0.42n ) ^ g - 

wo- w n 

= 0,5 / 0,84\ 0-5(1.84) = ,92 
w V n I ~ n n 

This is (m). 
We now give a 

Proof of Lemma 1 : 

E || X - /(X) || 8 = E || X - X, + X„ - g(X p ) || 2 

= S||X- X, || 2 + E || X, - flf(X p ) II s 

+ 2E(X - X, , X„ - fQW), (47) 

where (u, v) is the inner product of the /(-vectors u and v. Now 
||X - X„ || 2 = (|| X || - E || X HI 2 , so that 

E || X - X„ || 2 = var || X ||. (48) 

Further, the inner product 

((X - X„ , X„ - 0(X,)) 

= (X, X„) - (X, g(X p )) - (X„ , X„) + (X, , <7(X„)) 

= || X || r — r 2 — || X || || g(X„) \\ cos a, + /•„ || <7(X P ) || cosa 2 , 

(49) 
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where oc x is the angle between X and g(X p ), and a 2 is the angle between 
X„ and g(X p ). Since X and X p are colinear, a t = a 2 = a(X p ) a function 
of X p . Now from (43) we see that the random variable R = || X || 
and the vector ($x , • • • $ n _i) are statistically independent. Since X„ , 
depends only on <£, • • • <£„_! and not on R, we conclude that || X || is 
independent of g(X p ) and a(X p ). Thus from (49) 

E(X -X P ,X P - g(X p )) 

= r E || X || - rl - E \\ X \\E[\\ g(X p ) \\ cos <x(X p )] 

+ r E[\\g(X p ) ||cos«(X,)] =0. (50) 

Equations (47), (48) and (50) imply Lemma 1. 
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